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Abstract 

Background: Advances in modern high-throughput techniques of molecular biology have enabled top-down 
approaches for the estimation of parameter values in metabolic systems, based on time series data. Special among 
them is the recent method of dynamic flux estimation (DFE), which uses such data not only for parameter 
estimation but also for the identification of functional forms of the processes governing a metabolic system. DFE 
furthermore provides diagnostic tools for the evaluation of model validity and of the quality of a model fit beyond 
residual errors. Unfortunately, DFE works only when the data are more or less complete and the system contains as 
many independent fluxes as metabolites. These drawbacks may be ameliorated with other types of estimation and 
information. However, such supplementations incur their own limitations. In particular, assumptions must be made 
regarding the functional forms of some processes and detailed kinetic information must be available, in addition to 
the time series data. 

Results: The authors propose here a systematic approach that supplements DFE and overcomes some of its 
shortcomings. Like DFE, the approach is model-free and requires only minimal assumptions. If sufficient time series 
data are available, the approach allows the determination of a subset of fluxes that enables the subsequent 
applicability of DFE to the rest of the flux system. The authors demonstrate the procedure with three artificial 
pathway systems exhibiting distinct characteristics and with actual data of the trehalose pathway in Saccharomyces 
cerevisiae. 

Conclusions: The results demonstrate that the proposed method successfully complements DFE under various 
situations and without a priori assumptions regarding the model representation. The proposed method also permits 
an examination of whether at all, to what degree, or within what range the available time series data can be validly 
represented in a particular functional format of a flux within a pathway system. Based on these results, further 
experiments may be designed to generate data points that genuinely add new information to the structure 
identification and parameter estimation tasks at hand. 

Keywords: Biochemical systems theory, Dynamic flux estimation, Metabolic pathways, Parameter estimation, 
Structure identification, Time series data 



Background 

A grand challenge of biomathematical modeling is the 
conversion of a biological system into a computational 
structure that formalizes the underlying system. An im- 
portant and very challenging component of this process 
is the estimation of parameter values. The task is typic- 
ally pursued with one of two generic approaches, namely 
a forward (bottom-up) or an inverse (top-down) method. 
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Until recently, essentially all models of metabolic path- 
way systems were developed according to the first strat- 
egy, that is, by characterizing model components and 
processes one at a time and subsequently merging all 
"local" information about kinetic reaction steps into one 
comprehensive dynamic model. Although this forward 
approach is theoretically straightforward, implementa- 
tion procedures often fail and, moreover, have intrinsic 
disadvantages [1]. For instance, the necessary informa- 
tion is usually obtained in vitro and from different 
experiments, so that there is no guarantee that it is en- 
tirely compatible and consistent. 
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The second, top-down approach uses data that characterize 
the entire system and attempts to estimate all parameter 
values at once with a sophisticated optimization algo- 
rithm. Specifically, this type of method employs time 
series data that describe the full dynamic response of a 
pathway system to some stimulus, such as an environ- 
mental stress {e.g., heat shock) or the availability of food 
(e.g., glucose uptake). In contrast to the local data 
obtained from traditional experiments, the great appeal 
of using these types of "global" data is that most, if not 
all, measurements are taken from the same biological 
system under the same conditions. Furthermore, these 
data contain enormous and essentially unadulterated in- 
formation about the structure, dynamics and regulatory 
mechanisms that govern the biological system under in- 
vestigation. The main drawback of top-down 
approaches is that the actual extraction and integration 
of this information into fully functional, explanatory 
models is challenging. In fact, more than one hundred 
articles addressing this task appeared within the past 
ten years. They focused on various aspects of the esti- 
mation process, but most of them were dedicated to the 
main issue of optimizing parameter values against the 
observed time series data (e.g., [2,3]). 

Whether a forward or inverse approach is used, the es- 
timation of parameter values necessitates assumptions 
regarding the functions or rate laws that describe the 
reactions of interest. As a prominent example, the typ- 
ical default for enzymatic reactions in a metabolic path- 
way is the Michaelis-Menten rate law (MMRL) or one of 
its variations. While such assumptions are understand- 
able, they create an immediate conundrum. Namely, the 
true mechanisms governing a biological process are in 
reality unknown or at least unclear. As a result, the 
estimation process is from the start unguided, uncertain, 
or maybe even based on modestly or entirely wrong 
assumptions. Also, descriptions of more complex en- 
zyme mechanisms contain numerous parameters if sev- 
eral substrates or reactions are involved, so that the 
alleged functions cannot be identified from the typically 
sparse data [4,5]. 

In addition to the troublesome issue of model selec- 
tion, most proposed methods for estimation from time 
series data face significant problems related to the data 
themselves, to inefficient algorithms, and to a variety of 
computational issues. To complicate matters further, 
these issues are usually superimposed. The data may be 
overly noisy, incomplete, collinear with each other, 
or non-informative. The computational algorithms are 
often slow to converge, converge to a locally but not glo- 
bally optimal solution, or do not converge at all. Finally, 
there is a mathematical issue, especially for systems with 
many parameters, namely that a system may admit solu- 
tions that are distinctly different yet equivalent, or 



essentially equivalent, with respect to the residual error. 
This type of result, referred to as sloppiness and uniden- 
tifiability, may be due to redundancies in candidate 
parameter sets and has received much attention in re- 
cent times [6-8]. 

A different type of sloppiness may be caused by the 
fact that different model structures may give essentially 
identical residual errors. For instance, several probability 
density functions often model the same data equally well 
[9]. Moreover, two "wrong" structures or representations 
within a model may permit compensation of errors be- 
tween different terms or equations [10]. It is not even 
clear whether the residual error (SSE) is always the best 
metric for the quality of fit [11]. For instance, the "best" 
models in terms of having the smallest SSE tend to have 
too many parameters and therefore encounter over- 
fitting problems. This issue can be serious, because an 
over-fitted model often lacks the capacity of extrapola- 
tion and predictive power with respect to data not used 
in the estimation or untested conditions. Therefore, it is 
necessary to develop tools for the evaluation of model 
validity and quality beyond residual errors. For instance, 
one should establish criteria to determine the appropri- 
ateness of the chosen mathematical representations, 
develop methods for assessing whether residual errors 
are due to idiosyncrasies or noise in the data, and de- 
velop diagnostic tools of discriminating between valid 
and invalid model structures. 

Recently we proposed a novel approach to metabolic 
systems estimation, called Dynamic Flux Estimation 
(DFE), that ameliorates several of the issues listed above 
[10]. DFE is executed in two distinct phases. The first 
phase consists of an entirely model-free data analysis 
that requires minimal assumptions and reveals inconsist- 
encies within the data, and between data and the alleged 
system topology. Generally, the system is represented as 
a set of ordinary differential equations (ODEs) so that 
the instant change in each metabolite (i.e., its derivative) 
equals the sum of fluxes that enter or leave the metabol- 
ite pool: 

= Xi = '^Influxes — ^^Effluxes . 

The left-hand side of this ODE can be interpreted as the 
slope of the time course of the variable X t at a given 
point in time. Therefore, assuming that the time series 
data are more or less complete and smooth— or can be 
validly smoothed (see later) —one can estimate the slope 
of the time course at each time point and substitute the 
slopes for the derivatives. If the system contains N equa- 
tions, and if data are measured at K time points, this 
substitution decouples the system of N differential equa- 
tions into N sets of K algebraic equations each. This 
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system is linear in the fluxes and can be assessed with 
methods of linear algebra. In particular, it is easily solved 
at each time point if the system has full rank. 

The result of this first phase of DFE is a representation 
of each flux as a numerically characterized function of 
time and as a function of all contributing metabolites. 
This representation is not explicit, but purely numerical 
and consists of points in plots of flux vs. time or flux vs. 
metabolites and modulators. The second phase of DFE 
addresses the mathematical formulation of each process 
in the biological system by attempting to convert these 
numerical plots into mathematical representations, such 
as a Michaelis-Menten or Hill rate law or a power-law 
description. In contrast to most other methods, where a 
functional form had to be assumed a priori, this step 
allows quantitative diagnostics of whether a candidate of 
a mathematical formulation may be appropriate, at least 
within certain ranges of the contributing variables. The 
subsequent determination of parameter values is now 
much easier, because it involves explicit functions that 
are addressed one flux at a time. 

DFE offers substantial advantages. It makes almost no 
assumptions and is straightforward if the right data are 
available. It reveals inconsistencies within the data, 
avoids compensation among and within equations, and 
permits quantitative diagnostic tools of whether the 
assumed mathematical formulations are appropriate or 
in need of improvement. In addition, since DFE identi- 
fies parameters based on explicit single-flux representa- 
tions, the estimation of parameter values is much easier 
and more reliable than in other top-down approaches. 
As a result, DFE promises significantly improved ex- 
trapolation capacity toward new data or experimental 
conditions. 

Alas, DFE also has limitations and drawbacks. First, it 
requires more or less complete time series data that 
characterize the investigated system. These data are still 
relatively seldom, although they are being generated at 
an increasing rate and with rapidly improving quality. 
Second, and arguably more limiting, a unique solution 
of the flux equations in the first phase of DFE is only 
possible if the flux system is of full rank. However, most 
actually pathway systems contain more fluxes than 
metabolites and are therefore underdetermined. 

Several constraint-based optimization techniques have 
been proposed for stoichiometric analyses of underdeter- 
mined metabolic systems [12]. They have become a main- 
stay of flux balance analysis (FBA [13]) and work well 
under steady-state and pseudo-steady-state (PSS) assump- 
tions [14-17]. Mahadevan and co-workers [18] extended 
the traditional FBA to account for dynamics and presented 
two different formulations: the dynamic optimization ap- 
proach (DOA) and the static optimization approach 
(SO A). DOA involves optimization over the entire time 



period to obtain flux profiles over time, while SOA 
involves dividing the batch time into several time intervals 
and solving the instantaneous optimization problem at the 
beginning of each time interval. These methods basically 
are variations of FBA and need, for the determination of 
flux profiles at each time point, constraints and objective 
functions, which describe some goal the cell aims to reach. 
For the case of microbial systems, a reasonable objective 
may be maximization of the growth rate. However, deter- 
mining an unbiased objective function in a eukaryotic 
system is often difficult. 

In contrast to these methods that require objective 
functions, we proposed extending DFE with the infusion 
of additional information [19]. We distinguished four 
cases. First, the connectivity of the systems is not fully 
known or some of the connections are uncertain. Sec- 
ond, some of the time series data were not measured, al- 
though it is known how the corresponding metabolites 
are involved in the pathway. Third, the system contains 
"missing" metabolites which are neither known nor mea- 
sured, but in actuality affect the system significantly. 
And fourth, the flux system is underdetermined, even 
though the time series of all relevant metabolites are 
measured. 

The first issue might be ameliorated by methods devel- 
oped for structure identification of unknown of ill- 
characterized pathways. These methods include a wide 
spectrum of techniques, such as perturbation methods, 
causality models, correlation-based approaches, or prob- 
abilistic models, some of which are based on time series 
data (see [2] for review). The lack of certain data in the 
second case could be complemented by deducing the 
unknown time profiles from time series of neighboring 
metabolites, if the corresponding enzymatic information 
is available for fluxes producing and degrading a metab- 
olite in the equation. However, this approach of using 
kinetic information obtained in vitro, or maybe even 
from different organisms, is naturally problematic due to 
some degree of bias and uncertainty. A possible solution 
strategy for the third case is to check the mass balance 
in the entire system throughout the time period. If 
significant changes in mass balance are observed, add- 
itional biological insight will be needed to check the 
pathway model and identify possible sources of leakage 
or gain of mass. If the masses are more or less balanced, 
it is still possible that important fluxes or metabolites 
are missing. However, there is currently no obvious 
defense in this situation. Finally, to complement an 
underdetermined flux system, some of the fluxes need to 
be estimated with information from other sources. For 
instance, it might be possible to obtain fluxes directly 
from experiments, but such data are rare. As an alterna- 
tive, one may assume the functional form for an enzym- 
atic reaction, and if corresponding kinetic information is 
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available, for instance from BRENDA [20], parameter 
values may be estimated for this functional form. As a 
variation on this strategy, one could assume some 
canonical model, such as power-law functions [21] or 
lin-log approximations [22,23], if some of the variables 
and fluxes operate within relatively small ranges. Clearly, 
this option runs counter to the model-free nature of 
DFE, but might be the only feasible solution. Instead of 
using kinetic information, one could also select some of 
the decoupled equations and use optimization methods 
to fit the selected model to the time series data 
(e.g. [24,25]). 

Although we presented proof of concept that the differ- 
ent approaches described above can be used to supple- 
ment DFE, these approaches are not always optimal, 
because they require additional information and assump- 
tions that are a priori not validated. The question thus 
arises: can we directly squeeze additional information out 
of the time series data, without the need of further 
assumptions and additional information? And if so, 
under what conditions is that possible? Providing at least 
partial answers to these questions is the topic of this 
article. 

Specifically, we propose here a distinct approach to sup- 
plementing DFE with information hidden in suitable meta- 
bolic time series. Extracting this information permits the 
determination of a sufficient subset of fluxes to execute 
DFE on the rest of the flux system. In contrast to all other 
solutions presented so far for the complementation of DFE, 
the method proposed here does not require any assump- 
tions regarding the mathematical representation of the 
fluxes. Furthermore, kinetic information or knowledge of 
the functional forms of the enzymatic reactions is not 
required. We will demonstrate in the following that the 
proposed method can succeed even if some of the time 
series data are not measured or when there is mass leakage 
in the pathway systems. In addition, the new method allows 
us to address a recurring unanswered question, namely 
how many time series data are needed to estimate the 
structure and parameters of a system. 

Specific details of the proposed approach are presented 
in the Methods section. While the methodological details 
require some technical discussion, the concept of the pro- 
posed method may be best explained with the following 
simple example. Suppose a metabolic system contains the 
equation X t = vf(Xf) — v^pQ), which is typical for a reac- 
tion between Xj and X if combined with the degradation of 
Xi within a linear section of a pathway system: 

Suppose we have time series data, so that we can estimate 
Xi for every measured time point with sufficient accuracy. 
Suppose further that the time series data are such that we 



have m time points (in the same or in different datasets) 
where X t has the same value (e.g., q), whereas Xj has a dif- 
ferent value at each of these time points. It is reasonable to 
assume that is a function in a strict mathematical sense, 
which means that v i ~(c i ) has one unique (although yet un- 
known) value vci. If so, we have m equations of the type 
Xi = v~l(Xj) — , where the values of Xj and X t are 
known directly from the data and vc t always has the same 
value. Using these quantities, the methods proposed here 
allow us to estimate the functional format of v~}~(Xj), at least 
over some pertinent range of Xj values. Once we know 
v^~(Xj), we can determine vj(Xi). Thus, we now have nu- 
merically quantified two fluxes, which reduce the discrep- 
ancy between the number of equations and the number of 
independent fluxes. Repeated application of the method 
allows DFE for the entire system. An illustrative v\ example 
is shown in Methods and other examples are presented in 
the Results. If the function depends on more than one vari- 
able, the procedure is the same in concept but more com- 
plicated in detail (see Additional file 1). 

Methods 

The proposed method offers a systematic strategy to extend 
DFE and to ameliorate its limitations. Just like DFE, the 
proposed method starts with an optional data preproces- 
sing step, but without any assumption regarding the func- 
tional formats of the fluxes in the system. First, the 
experimental data are tested for mass conservation to make 
sure no mass is lost or gained during the observed time 
period. If the data do indicate losses or gains in mass, it is 
useful to locate possible branches off the main pathway(s) 
and to account for the changes in total mass of the metabo- 
lites in the pathway [19]. Second, the time series data are 
smoothed as necessary, which makes it easier to estimate 
the slopes of all time courses at a given number of time 
points, using different numerical techniques. These estab- 
lished smoothing methods include splines, artificial neural 
networks, as well as different types of filters, such as the 
popular Kalman, Savitzky-Golay, Whittaker, or Eilers filter 
(see [2,26,27] for applicable methods). In parallel to these 
data preprocessing steps, the pathway structure (the system 
topology) is used to generate a system of symbolic equa- 
tions describing the dynamics of the system. The generic 
format for such a representation may be written as 

(1) 

where X t denotes the concentration or amount of a variable 
or variable pool and n is the total number of time- 
dependent variables in the system. The functions and 
Vf represent reactions or fluxes entering or leaving the 
quantity X b respectively. Substituting slope estimates for 
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the differentials in this system of equations decouples the 
ordinary differential equations (ODEs) and results in a sys- 
tem of fluxes that is linear at each time point t [21,28,29]. 
The algebraic equations may be represented in matrix for- 
mat as 

s($)=Nxv($), /=l,...,tf, (2) 

where s is a vector of slopes, N is the stoichiometric matrix, 
v is a vector of fluxes, and K is the number of time points 
ty t<z. . tp. . t K where measurements are available. 

Next we check the rank of the linear set of algebraic 
equations in Eq. (2). The system can be easily solved at 
each time step to obtain dynamic profiles of all fluxes if 
the system has full rank. Over-determined systems may be 
solved by pooling fluxes, the use of pseudo-inverse meth- 
ods, or regression. However, if the system is underdeter- 
mined, the solution space is infinite. To overcome this 
issue, some of the fluxes need to be estimated independ- 
ently, until the system has full rank and can be solved 
uniquely. Elsewhere we showed that additional informa- 
tion maybe used to characterize selected fluxes [19]. Here, 
the goal is to estimate some fluxes directly from the time 
series data, without evoking other sources of information. 

As an introductory example, consider a linear part of a 
pathway with feedback inhibition as shown in Figure 1(a). 
The equations that describe the system in terms of fluxes are 

X\ — V\ — v 2 

X 2 = v 2 -v 3 . (3) 
X 3 = v 3 - v 4 

The system could be part of a larger pathway system, but for 
this illustration the context is not relevant. For the illustra- 
tion, fluxes were generated with a mix of power-law and Hill 
functions, namely 

vi = 1.5 6 
v 2 = 2AX° l 8 

^maxXf (4) 

v 4 = 2I 3 075 

where V max = 5 and K M = 2. We use these settings to create 
artificial data, but subsequently assume no knowledge of the 
functions or parameters in Eq. (4). 

Suppose time series data were measured and they are 
without noise (Figure 1(b)). Eq. (3) immediately indicates 
that the flux system is underdetermined and therefore 
has infinitely many solutions. A unique solution could 
be obtained if one of the fluxes could be determined in- 
dependently. To achieve this independent determination, 
one may choose any one of the three equations in the 
system. For ease of computation, one will typically prefer 
an equation that contains as few fluxes and as few sub- 
strates and modulators as possible. In this linear system, 
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Figure 1 (a) Generic three-variable linear pathway with 




feedback inhibition (Eqs. (3-4)). (b) Time series data, consisting of 


50 artificial "measurements" that were generated with initial 




conditions X ] (t 0 ) = 5, X 2 {t 0 ) = 0.1 , and X 3 {t 0 ) = 8; X ] , X 2 , X 3 are 




represented by blue, green, and orange dots, respectively. 





all equations have two fluxes and each of them depends 
on only one metabolite, so that there is no advantage to 
choosing one equation rather than another. 

Generically, we intend to solve the fluxes in the i th 
equation, which here happens to have only two fluxes, 
namely one influx (v in ) going into the pool X b and one 
efflux (v out ) leaving this pool. The flux v in depends only 
on the precursor X in of X t and v out depends only on X t 
itself; to minimize confusion, we call this variable gener- 
ically X out . Extracting the i th equation from Eq. (1), we 
thus obtain, in general terms, 

Xi = V in ~ Vout- (5) 

The functional form of neither flux is assumed to be 
known. Substitution of derivatives with slopes results in 
K equations of the type 

Si(tj) = v in (tj) - v out (tj) , j=l,...,K. (6) 

As a specific illustration, consider the second equation 
(X 2 = v 2 — V3) in Eq. (3), where v 2 depends only on the 
precursor Xi and v 3 depends only on X 2 . We substitute 
the derivative X 2 with slopes that can be measured dir- 
ectly from the time series data, possibly upon smooth- 
ing. For a total of 50 time points, one thus obtains 50 
algebraic equations of the type 

X 2 w S 2 (tj) = v 2 (tj) - v 3 (tj) , ; = 1, . . . , 50. (7) 
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It is reasonable to assume that the in- and effluxes are 
true functions in a mathematical sense. Thus, since v in 
depends only on X iw v in must have one and only one 
value for every given value of X in . In particular, if X in 
assumes the same value at two different time points, v in 
must have the same (so far unknown) value at both time 
points as well In the illustration example, v 2 depends 
only on X v Thus, for every value of X x there is one and 
only one value of v 2 . The proposed method therefore 
requires a screening of the available datasets with the 
goal of identifying different situations where X in has 
some fixed value X in _ const . For all these situations, v in also 
has some fixed value v in _ const . Since we do not know the 
functional form of v iw we cannot directly compute this 



value v h 



t . However, we do know that this value is 



very similar for all situations where X in « X in _ const . Thus, 
for the set of all X in « X in _ const) Eq. (6) has the form 



Stiff) — v in -const (tj) ~ v out{tj) 



(8) 



In the illustrative example, we screen the available data 
sets and search for different situations where X x has the 
same fixed value X lc and, thus, v 2 also has the same (yet 
unknown) value v 2c . Thus, for the entire set of all X x « 
Xic the second system equation has the form 



S2(tj)=V 2c (tj)-V 3 (tj). 



(9) 



For instance, X x has similar values (-0.26) at time points 
4, 4.8, 8.8, and 9.2, while X 2 has different values at these 
time points (Figure 2(a)). 

We repeat this type of screening for different sets of 
the same or very similar values of X in . The result is a set 



of sets with equal X ir 



values within each set but dif- 



ferent X in const values for different sets. These sets form 
a histogram with a bin for each X in _ const . If the range of 
each bin is small enough, we can assume every X in in 
the same bin to have very similar values, so that their 
corresponding v in _ const are also very similar. Henceforth, 
we only retain bins with at least two entries. An example 
in the illustrative example consists of time points 3.4 
and 9.6, where X x has again similar values. In this case, 
the value is -1.26, which is different from the value we 
screened before. Similarly, for time points 1, 8.4, and 9.4, 
X x has a value of -0.6 (Figure 2(a)). Figure 2(b) shows 
many situations in the dataset where X x has approxi- 
mately some fixed value, and these sets of X x are 
reflected in a "bin database of values." Within each bin, 
the corresponding value of v 2c is very similar as well. 

Suppose we have identified P bins that contain at least 
two X in . For these bins we determine the corresponding 
X out values, which are typically different from each other. 
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Figure 2 (a) Fixing Xi within a narrow range (-0.26), four 
instances of are found (solid red circles). Fixing X } within 
another narrow range (-0.6) provides three instances of X } (solid 
orange circles). Similarly, two instances of X ] are found for X ] -1.26 
(solid blue circles), (b) Collection of 34 "bins" that exhibit the 
number of times X ] has approximately the same value given on the 
x-axis; the range of each bin was chosen as 0.05. Among the 34 
bins, 9 bins have at least two instances of the same Xy, all other bins 
are discarded, (c) Representation of different X 2 values 
corresponding to at least two X ] values in each of the 9 remaining 
bins. The bars connect the two or more X 2 values in each bin. 
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Suppose that bin p contains q values. Thus, we obtain q 
equations of the type 

Si(bin p ) = v in _ const (bin p ) - v out {bin p ), p = l,...,P, 

(10) 

where v in const (bin p ) always has the same value, but 5/ 
(birip) and v out (bin p ) have different values. For our 
illustration we specify nine bins (P = 9) (which have at 
least two Xi (Figure 2(b)), and their corresponding 
values of X 2 at the same time points are shown in 
Figure 2(c). The 6 th of the nine bins (shown as the 
orange bin in Figure 2(a)) contains three instances of X v 
Therefore, we obtain three equations of the type 

S 2 (bin 6 ) = v 2c (bin 6 ) - v 3 (bin 6 ). (11) 

Equation (10) is formulated analogously for each bin p 
=1, . . ., P. In each case, v out {bin p ) can be represented as 
at least two equations of the type 

v 0 ut(bin p ) = v in _ comt (bin p ) - Si(bin p ), p = l,...,P. 

(12) 

Since we do not know the functional form of v iw we do 
not know the numerical value of v in _ const (bin p ). However, 
since v in const (bin p ) is a constant for each bin, the relative 
positions of a group of values of v out (bin p ) depend on 
each value -Si (bin p ) within a given bin, and the slope 
values can be measured directly from the time series data. 
In addition, since v out {bin p ) is solely determined by X out 
(bin p ), we can characterize the relative positions of a set of 
X ou t (birip) and their corresponding values -Si (bin p ). Col- 
lecting these relationships, we can establish a plot of X out 
(bin p ) versus -Si (bin p ). If the bin contains only two points 
of X out) we consider them as a pair and link them with a 
connecting line. If the bin contains q points of X out (where 
q > 2), we sort X out based on their values and connect 



every two adjacent points as a pair to form a total of q-\ 
pairs. In order to address these pairs, we use an additional 
index for the position of each pair in each bin, such as 
(X out (pair r )(l), -Si (pair r )(l)) for the first point and (X out 
(pair r ) (2), -Si (pair r ) (2)) for the second point, where 
r=l, q-1. 

To continue the illustration, the 8 th bin contains two 
instances of X x -1.26. The corresponding values of X 2 
are 1.54 and 2.93, and the -S 2 values are -1.35 and 0.93, 
respectively. The points in the plot of X 2 (bin 8 ) versus 
-S 2 (bins) are therefore represented as (1.54,-1.35) and 
(2.93, 0.93). We consider these two points as a pair and 
link them using a red line (Figure 3(a)). Similarly, the 5 th 
bin contains four instances of X x -0.26. Their corre- 
sponding values of X 2 are 1.20, 1.37, 1.66, and 1.99, and 
the -S 2 values are 0.05, 0.35, 1.02, and 1.65, respectively. 
The points in the plot of X 2 (bin 5 ) versus -S 2 (bin 5 ) are 
therefore represented as (1.20, 0.05), (1.37, 0.35), (1.66, 
1.02), and (1.99, 1.65). Two points each are considered a 
pair and linked with a red line (Figure 3(b)). After the 
pairs of points are determined, we prune the set by 
neglecting pairs where the distance between X out (pair r ) 
(1) and X out (pair r ) (2) is below some threshold 
d r = \X 0Ut (pair r )(l) - X out (pair r )(2)\. The reason is that 
small line segments tend to lead to unduly high estimation 
errors. The default value for d v is set as 0.2 in the examples 
shown in this article, but it will generally depend on the ac- 
curacy and quantity of the data. The higher the value is, 
the fewer pairs will remain after filtration. However, as 
long as the remaining pairs cover most of the spectrum in 
the X axis, an increase in d r might be preferable. Suppose 
s pairs remain after this filtering. Figure 4(a) shows the 
collective result for the illustration example. 

Equation (12) indicates that X out (bin p ) and -Si (bin p ) 
differ by a constant, since we do not know the value of 
v in_const (birip). This fact translates into a constant vertical 
shift in the y direction for each pair of points. In other 
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Figure 3 (a) The 8 th bin in Figure 2(b) contains two different X 2 values corresponding to the "blue" instances in Figure 2(a) for 
X-i ~1.26. The corresponding values of X 2 and -S 2l obtained from the plot of X 2 versus -5 2 , are (1.54, -1.35) and (2.93, 0.93). These two points are 
considered a pair and linked with a red line, (b) The 5 th bin of Figure 2(b), corresponding to the "red" instances in Figure 2(a), contains four 
instances of X ] -0.26. Their corresponding values of X 2 and -S 2 are (1.20, 0.05), (1.37, 0.35), (1.66, 1.02), and (1.99, 1.65). Two points each are 
considered a pair and linked with a red line. 
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X 2 Xi 



Figure 4 (a) Pairs of points satisfying a threshold value of d (see Methods) greater then 0.2. Seven pairs (s = 7; connected by blue lines) are 

selected for the following steps. The green line is the true functional representation of X 2 versus v 3l which in an actual situation is not known, (b) 

Pairs in (a) are merged, based on the distances between points in each "node" and the distances between two points in a pair, (c) Subgroups of 

pairs in (b) are merged, (d) If the value of v 3 is known for X 2 = 1 or for some other value. The entire cluster of lines is vertically shifted accordingly. 

If small values of X 2 are covered by the pairs, the shift is determined by the observation that a flux is usually zero if the substrate concentration is 

zero. Here, the sum of errors between the estimated points and corresponding points on the true green line is 0.0354. 
k J 



words, the relative y positions of the pair are preserved 
and the pair has to be shifted together by a yet unknown 
amount. While we do not know the size of the shift for 
each individual pair of points, all points collectively repre- 
sent the graph of X out versus v out) and it is reasonable to 
assume that this graph is continuous and usually even 
monotonic. Therefore, the next step is to merge the indi- 
vidual pairs by determining a proper shift for each pair. 

Intuitively, it is easy to see how to shift all pairs so that 
they are close to one continuous line. Automation of the 
process requires an algorithm that is not quite straight- 
forward, but can be facilitated with a graphical user 
interface; technical details of a possible merging process 
are presented in Figure SI of the Additional file 1. A 
pseudo-code of the merging is the following: 

SET each pair of points as a node 
SET each node as a subgraph 
WHILE the graph is not connected 

FOR each subgraph in the graph 

FOR each node in the current subgraph 



SET other-subgraphs as the subgraphs; exclude 
the current subgraph 

CALCULATE the distance from the current 
node to every node contained in other- 
subgraphs 

END FOR 

FIND the shortest distance and its corresponding 
nodes 

CONNECT these two nodes 
END FOR 
END WHILE 

When the merging is completed, all pairs of points are 
close to a relatively smooth line, but the overall shift of 
the group of pairs is not known. We do know that es- 
sentially all metabolic fluxes will have values close to 
zero when their substrate concentration approaches 
zero. Thus, if sufficiently small substrate values are avail- 
able in one of the bins, one easily estimates a reasonable 
shift. Should the flux value associated with some 
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substrate concentration be known, the shift can be 
determined from this information. A further alternative 
is the following. If the inferred trend line suggests that 
the flux follows some rate law, such as a Hill function, 
the parameters of this function, together with the appro- 
priate shift, can be obtained in a single optimization 
step. 

Figure 4(b) shows, for the illustrative example, the 
process of merging and shifting. The human eye has no 
problem accomplishing this task intuitively. In the auto- 
mated process (see Additional file 1), one connects each 
"node" (pair of points) with its closest-neighbor node 
and positioning the pair of points. This process creates 
two sub-groups of points. We recalculate the distance 
between each node in a sub-group with the nodes in the 
other sub-group, determine the closest pair of nodes, 
connect them, and shift the corresponding pairs into 
one sub-group as shown in Figure 4(c). Suppose the 
value of v 3 is known for X 2 = 1. If so, we ultimately shift 
the entire trend accordingly. The result is shown in 
Figure 4(d). A shift based on the association between 
zero flux and zero substrate concentration is an alterna- 
tive, although it does not uniquely prescribe a solution 
in this case. 

Finally, based on the numerical or graphical flux profile 
thus determined, one may test candidate functions that 
capture the flux-substrate relationship. For instance, the 
result in the illustrative example shows that the func- 
tional relationship of X 2 vs. v 3 is s-shaped. It could thus 
be consistent with the (true) Hill function in Eq. (4), 
although the computed result itself certainly would not 
prove that this format is correct. If one assumes, based 
on the results, that a Hill function is appropriate, one 
may fit this functional form to data to find the optimal 
parameter values of the flux-metabolite dependency. 
Without making such an assumption, one may alterna- 
tively connect the dots in Figure 4(d) with a continuous 
line and interpolate the values of fluxes using a spline or 
another smoothing method. The resulting trend line can 
be used as a "look-up" plot. 

Now that we have determined v 3 , it is easy to compute 
v 2 from the measured slopes of X 2 . The plot of v 3 is 
slightly curved, which is consistent with its power-law 
function in Eq. (4), although again, there is no proof. 
The Results section discusses further examples. 

The parameters of any candidate functional form are 
easily estimated, because no differential equations are 
involved and the problem is of low dimension; they rep- 
resent a fully parameterized kinetic model for the flux 
term itself and, subsequently for the differential equa- 
tion. Due to this simplicity, it is even possible to scan a 
variety of candidate functions and assess their appropri- 
ateness. If a suitable functional format can be deter- 
mined with appropriate parameter values, the task is 



completed. If not, one may represent the flux-substrate 
plot with a piecewise-polynomial function, such as cubic 
spline. Even in this non-explicit, numerical format, the 
result is sufficient to reduce one or two degrees of free- 
dom in the overall DFE task. Figure 5 presents the over- 
all flow and concept of the method. 

The procedure described above has generated one or 
two additional flux estimates. For the example in Eq. (3), 
the determination of v 2 and v 3 "fills" the rank, and the 
system can be uniquely solved. In fact, only one of the 
two is needed. For examples where one or two add- 
itional fluxes are not sufficient for a unique solution, the 
same procedure has to be performed with other equa- 
tions until enough fluxes are determined to make the 
flux system full rank. DFE subsequently identifies all 
other fluxes as plots against time or against their sub- 
strates and modulators. 

In cases where fluxes contain more than one variable, 
the time courses have to be screened for combinations 
where the contributing variables have the same values. 
The concepts of the procedure are exactly the same as 
for the univariate case, but the implementation is obvi- 
ously more involved (see Additional file 1). Also, such 
combinations are rarer than in the cases described 
above, so that these situations require more diverse data- 
sets for structure identification. 



Results 

The simple linear pathway shown in the previous section 
illustrated the concepts of the proposed extension to DFE. 
This section describes applications of the proposed meth- 
ods in the context of further didactic and actual examples 
that become increasingly more complicated. We begin 
with two artificial cases with distinct characteristics and 
conclude with the analysis of experimental observations 
describing trehalose metabolism in the yeast Saccharo- 
myces cerevisiae. 

Branched pathway with feedforward activation and 
feedback inhibition 

Consider a branched pathway with fluxes represented 
by various functional forms, including Michaelis- 
Menten and Hill functions with inhibition and activa- 
tion. The pathway, shown in Figure 6(a), can be 
described by the following set of ordinary differential 
equations [30]: 

Xi=vi - v 2 

X 2 =v 2 - v 3 - v 5 / 13 x 
X 3 =v 3 - v 4 
X 4 =v 5 - v 6 
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Figure 5 Flowchart of the proposed method. Starting with 
experimental time series, the data are smoothed and balanced for mass 
conservation, if necessary. The slopes of the time series at each time 
point are estimated. Combined with the knowledge of the system 
topology, substitution of the derivatives in the ODE with slope 
information yields a linear system of fluxes. If the system has full rank, 
solve the system with techniques from linear algebra. If the system is 
underdetermined, use auxiliary steps, as proposed in this article, to solve 
a subset of the fluxes until the system is of full rank. The results are the 
dynamic profiles of all extra- and intra-cellular fluxes in the system. If 
desired, make assumptions regarding the functional forms of the fluxes. 
These functions correspond to symbolic flux representations that can be 
independently fitted to the respective dynamic flux profiles and result in 
a fully parameterized kinetic model. As an alternative each process may 
be approximated as a piecewise function, for instance using spline 
methods. 



The kinetic descriptions for each of the reactions are: 
16 xX 5 
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As before, we use these formats to generate artificial 
data, but subsequently assume no knowledge of their 
characteristics. 

The system in Eq. (13) is not of full rank. Thus, some 
of the fluxes need to be determined with the proposed 
method. For our illustration, we select the third equation 
in Eq. (13), because it contains only two fluxes; also, v 3 
depends only on X 2 , and v 4 depends only on X 3 , which 
we know from the topology of the pathway. In the previ- 
ous example, all time series were oscillating and it was 
easy to find enough data points where one variable is 
fixed and other variables display different values. In the 
present example, each single dataset displays changes 
over time that show few repeated concentration values 
(see Figure 6(c)). In such a situation, which is more typ- 
ical than the earlier illustration example, one can use 
exactly the same method, applied to multiple datasets 
with different dynamic profiles, as long as one can val- 
idly assume that the functional forms are not affected by 
differences among the datasets. 

For this illustration, we simulated multiple datasets 
with the initial values presented in Figure 6(b), which 
could easily reflect actual experimental settings. Figure 7 
(a) shows the result of binning X 3 by using the first four 
datasets in Figure 6(b). The corresponding pairs of X 2 
(Figure 7(b)) range from 0.25 to 2.24, which covers most 
of the range of X 2 in the four datasets (from 0.25 to 
2.34). The merging process of pairs is shown in Figure 8, 
with panels corresponding to those in Figure 4. In par- 
ticular, Figure 8(c) exhibits the merged points, which evi- 
dently form a sigmoidal shape where the first few points 
are basically flat. Therefore, one can assume the flux at 
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Figure 6 (a) Metabolic network with positive feedforward and 
negative feedback. All enzymatic reactions are assumed to follow 
Michaelis-Menten or Hill kinetics except for those corresponding to v 2 
and v 5 , which are assumed to be represented with an Irreversible 
General Hyperbolic Modifier Kinetic function and with an Irreversible Hill 
function with one modifier, respectively (see Eq. (14) for details), (b) Sets 
of initial conditions used to generate six different datasets. (c) Time series 
data corresponding to the first set of initial values in (b); X h X 2 , X 3 , X 4 are 
represented by blue, red, orange, and green dots, respectively. 



the smallest X 2 (-0.25) to be close to zero and shift the 
entire set of merged pairs up by about six units to obtain 
the estimates of v 3 . Indeed, this step recoups the true 
flux, which is shown in green, but would be unknown in 
a real application. Once v 3 is determined, the system of 
Eq. (13) is still underdeter mined and another flux needs 
to be estimated to make the system full rank. The most 
straightforward choice is v 4 , which is directly computed 
from v 3 and the measured slopes of X 3 . 

Instead of v 4 , one could also estimate an additional 
flux from another equation in Eq. (13) using the same 
procedure, for example, by solving v 5 and v 6 in the 
fourth equation. Flux v 6 depends only on X 4 but v 5 




Figure 7 (a) Bins of instances of X 3 for different values; the 
range of each bin is chosen as 0.033. Among the 26 bins, 13 bins 
have at least two X 3 values; the others are discarded, (b) 
Representation of 13 sets of corresponding X 2 values in those bins 
that have at least two X 3 . The bars connect two or more X 2 values 
within each X 3 bin. 



depends on two variables Xi and X 2 . The steps of esti- 
mating v 5 and v 6 are described in Additional file 1. We 
also tested the proposed method by using six datasets in 
Figure 6(b) and the results similarly recover the true 
functional form (data not shown). 

The proposed method was also tested on a five- 
variable system that has been used as a benchmark prob- 
lem in many articles {e.g. [24,31-33]). To demonstrate 
the applicability of the method, we also added artificial 
noise to the time series data in this example and 
randomly picked sub-datasets from data generated with 
ten conditions. The details and results are shown in 
Additional file 1. 

Glycolysis and trehalose production 

This last example describes in a simplified fashion 
how the bakers yeast Saccharomyces cerevisiae 
converts glucose into end products and how trehalose 
is synthesized and degraded in a cyclic pathway 
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Figure 8 (a) Collection of s pieces exceeding a chosen threshold d (here s= 12 and d = 0.2; see Text). The green line is the true functional 
representation of X 2 versus v 3 . (b) Pairs in (a) are merged based on their distances and on the distances between two points in a pair, (c) The 
subgroups of pairs in (b) are merged, (d) The sigmoidal shape of points in (c) suggests that the flux of the smallest X 2 (-0.25) should be close to 
zero. The sum of errors between the estimated points and their corresponding true values (on the green line) is 0.0551. 



(Figure 9). The data [34] consist of actual in vivo 
NMR measurements of metabolic profiles that char- 
acterize how the yeast responds to heat stress in two 
time regimes at the genome, protein, and metabolic 
levels. For the illustration here we use the metabolite 
dynamics of normally grown cells that were then 
exposed to heat stress (39°C) and fed with a pulse of 
glucose. Immediately after glucose addition, the initial 
metabolite pools (G6P and FBP) increase, while trehal- 
ose (Tre) increases with a short delay and begins to 
decrease slightly after two minutes. The end products 
ethanol, glycerol, and acetate gradually accumulate. 
The concentration data are shown as dots in Figure 10, 
together with the modeling results that are described 
next. 

The model contains eight dependent variables and 
eight fluxes, as shown in Eq. (15), where V ext and V int 
represent the extracellular (0.05 L) and intracellular 
(0.00717 L) volume of the bioreactor and the cell popu- 
lation, respectively. Each of the fluxes is a function of 
some of the variables, as shown in Eq. (16), but it is im- 
portant to note that we do not make any assumptions 
regarding the functional forms of the fluxes. In 
principle, DFE seems to be directly applicable. However, 



the time series data contain the measurements of only 
five of the metabolites, namely Glc (X 2 ), G6P (X 3 ), Tre 
(X 4 ), FBP (X 5 ), and extracellularly accumulated end pro- 
ducts (EtOH, Gly, and Ace; X 6 ). Without the measure- 
ments of X 2 , X 7 , and X 8 , the system in Eq. (15) is not of 
full rank and, due to the experimental set-up, v 7 and v 8 
cannot be measured or determined directly by estimat- 
ing slopes. 

To complement the rank of the flux system, we use 
the proposed method of flux estimation. First, one 
should note that the measurements of Glc (Xi) con- 
cern extracellular glucose. Thus, X 1 is easy to measure 
experimentally, but it is very difficult to obtain good 
measurements of intracellular glucose (X 2 ), because it 
is immediately converted in to G6P (X 3 ). Thus, the 
proportion of Glc (X 2 ) is negligible in comparison to 
Glc (Xi), and because the measured concentration of 
glucose is close to the sum of Glc and Glc (X 2 ), 
we merge Xi and X 2 into one pool, which is 
represented by the sum of the first two equations in 
Eq. (15). Furthermore, the amount of material enter- 
ing the pentose phosphate pathway (PPP; X 7 ) is not 
directly measurable, but independent lab experiments 
had indicated that it has a value of approximately 5% 
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EtOH+Gly+Ace(X 6 ) 

Figure 9 Schematic representation of a simplified model of glycolysis and the trehalose cycle in the yeast Saccharomyces cerevisiae 
(adapted from [34]). X, and v, represent dependent variables and fluxes, respectively. One inhibitory interaction is shown in red. Abbreviations: 
X h extracellular glucose; X 2l intracellular glucose; X 3l glucose 6-phosphate; X 4l trehalose; X 5l fructose 1,6-bisphosphate; X 6l extracellularly 
accumulating end products (ethanol, glycerol and acetate); X 7l mass diverted into the pentose phosphate pathway; X 8l mass consumed by other 
pathways {e.g., TCA); v h glucose transport; v 2l hexokinase and glucokinase; v 3 : aggregated step of all enzymatic steps between glucose 
6-phosphate and the production of trehalose; v 4 , trehalase; v 5 , phosphoglucose isomerase and phosphofructokinase; v 6l aggregated step of all 
enzymatic steps between fructose 1,6-bisphosphate aldolase and the release of end-products; v 7l flux into the pentose phosphate pathway; v 8l 
flux towards other pathways (leakage). Metabolites without available experimental measurements are shown in gray. The flux v 6 (blue) is directly 
measurable from the time series of X 6 . Fluxes v 3 and v 4 (green) were estimated using the proposed method. 



X,=(v 3 -v,)/Vi 



of the glycolytic flux; thus v 7 = 0.05 x vs [34]. With To supplement the underdetermined DFE, we select the 
these simplifications, the system can be formulated as: equation X 4 = (v 3 - v 4 )/F int in Eq. (15), since it contains 

only two terms and the measurements of X 3 and X 4 are 
X\— — Vi/Vext available. As before, we fix X 3 at some values 

X2=(vi + 2v 4 — V2)/Vint (Figure 11(a)) and find the corresponding X 4 and -<S 4 

X 3 =(v 2 - 2v 3 - v 5 - v 7 ) /Vim (Figure 11(b)). The merged pairs suggest an approximately 

exponential function (at least for the range of available X 4 ) 
and the minimum of X 4 is very close to zero. For a con- 
v&j/Vint centration close to zero, the value of the flux should be 

close to zero as well. Therefore, the entire cluster of pairs 
is moved up around 4 units, and the updated functional 
plot is shown in Figure 11(d). The corresponding v 3 can 
now be calculated accordingly and transformed to the 
form as fluxes versus time. After the determination of v 3 

F (X X ) anC * V4> ^ 6 s y stem °^ ^ becomes full rank and the 

Vl — 3 rest of the fluxes at each time point can be solved with 

1A> = FoiXo ) 

) { DFE even without knowledge of the times series of X 7 and 
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_ r / v x nificant loss or gain of mass (Figure 11(f)). 
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Figure 10 Experimental metabolite time courses of glucose metabolism determined by in vivo C-NMR in Saccharomyces cerevisiae 
grown under optimal temperature (30°C) with a single pulse of glucose (65 mM) (adapted from [34]). The dots for X h . . ., X 6 are 

experimental measurements, while X 7 was determined from the flux v 7l which was inferred with the methods described in the Text. The lines are 
the result of a model simulation with the inferred fluxes. The end products (ethanol, glycerol and acetate) are summarily represented as X 6 . 



Once we have obtained the time series of all fluxes, we 
can generate the plots of concentrations of metabolites 
that are involved in the enzymatic reactions (see Eq. (16)) 
versus a flux. The results are shown in Figure 12. As a 



validation test, we used these numerical flux representa- 
tions of each enzyme catalyzed reaction to simulate the 
concentration changes of metabolites. The simulation 
results are shown in Figure 10. 




X4(mM) X3(mM) time (mins) 

Figure 1 1 (a) The experimental concentration data (29 time points) were smoothed and interpolated with a spline function, thereby 
yielding metabolite levels of X 3 at about 300 time points. These X 3 values were put into 186 different bins with size 0.03. Among these, 54 
bins have at least two X 3 values, (b) Graph of X 4 values, corresponding to at least two X 3 values in each of the X 3 bins. Selection of s pairs with a 
threshold d of 0.3 (here 5=12; see Text) are selected, (c) Pairs in (b) are merged, (d) Resulting functional plot of X 4 vs. v 4 ; the blue dots represent 
the dots in (c), while the red triangles represent the true plot of X 4 vs. v 4 in the dynamic model; in reality, these would not be known, (e) 
Functional plot of X 3 vs. v 3 (blue dots), calculated from the blue dots in (d), and true values of X 3 (red triangles) according to the dynamic model, 
(f) Confirmation that the total mass (represented as the number of 3-carbon units) does not change appreciably over time. 



Chou and Voit BMC Systems Biology 201 2, 6:84 
http://www.biomedcentral.eom/1752-0509/6/84 



Page 15 of 17 





Xz (mM) Xs (mM) Xz (mM) 



Figure 12 Results from the proposed method and subsequent application of DFE to yeast data from the model in Figure 9. Shown are 
metabolite concentrations against fluxes at different time points (blue dots), connected by inferred trend lines for all fluxes (green lines). 



Discussion 

Of all steps in the generic mathematical modeling 
process, parameter estimation and structure identifica- 
tion continue to be among the most severe bottlenecks 
for modeling biological systems. Until recently, this task 
was typically pursued from the bottom up by using local 
data from individual enzymatic steps. However, modern 
techniques of molecular biology have provided us with a 
strikingly different estimation strategy, namely a top- 
down or inverse approach, which is based on dynamic 
time series data that are being generated with rapidly in- 
creasing frequency and quality. Many recent articles 
have proposed various methods to tackle this inverse es- 
timation problem using time series data. However, none 
of these methods are effective in all cases. Furthermore, 
almost all methods have been focusing on the goodness 
of fit and the speed of the algorithm, but not necessarily 
the quality of fit in terms of the validity of the model, 
extrapolation ability, and predictive power with respect 
to data not used in the estimation. In addition, there has 
been little discussion of the diagnostic tools for data fits 
beyond the residual error. For instance, it is possible that 
a fit is good in terms of the residual error, but that the 
estimated fluxes are incorrect because of numerical 
compensations between terms within the model [10]. 

Dynamic Flux Estimation (DFE) [10] addresses several 
of these issues successfully, but only if the data are ra- 
ther comprehensive. More limiting, DFE requires that 
the system of fluxes is of full rank. When the number of 



fluxes exceeds that of the dependent metabolites, either 
because of the stoichiometry of the pathway or due to 
the lack of measurements of some metabolites, DFE can- 
not be applied directly, because the system of fluxes is 
underdetermined. To supplement such a system, we re- 
cently proposed methods for supplementing DFE with 
other information that may be used as a substitute for 
unknown fluxes [19]. However, these methods are suc- 
cessful only under certain restrictive conditions, for in- 
stance, when the enzymes in the system are well 
characterized under pertinent conditions, sufficient kin- 
etic information is available, and all significant metabolic 
time series are measured. One could also determine 
some of the fluxes within the system by fitting pre- 
selected models to time series data. However, this pre- 
selection requires the definition of functional forms for 
the reactions in question, which in truth are often 
unknown. 

In this article we propose a model-free approach with 
minimal assumptions to supplement DFE with informa- 
tion already embedded in the time series data. The pro- 
posed method starts with the selection of a decoupled 
equation; preferable one that contains a minimal num- 
ber of terms and contributing metabolites. Within this 
equation, we repeatedly fix one or a few variables that 
have constant or very similar values within certain small 
ranges, and find the corresponding values of the vari- 
ables that appear in another flux of the equation. The re- 
sult of this step is a plot of a flux versus a metabolite, 
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with several pairs of points showing the relative posi- 
tions of the true metabolite concentration and the flux 
values in each pair. The position of each pair is initially 
subject to shifting in the y direction by an unknown 
amount. The correct shifting of pairs may be accom- 
plished with an automated or manual merging process 
that, for instance, accounts for the fact that the flux 
value should be zero when the metabolite concentration 
is zero. One could also measure the flux value at some 
metabolite concentration experimentally. Furthermore, if 
an enzymatic rate function is deemed correct and corre- 
sponding kinetic information is available, the vertical 
shift in the flux can be calculated. Once the metabolite- 
flux plots are established for all fluxes, one can select a 
suitable mathematical representation for the entire de- 
pendency or use a piecewise approximation for different 
ranges of data. One could also use the metabolite-flux 
relationships directly as "look-up" plots. 

The proposed method may appear cumbersome or 
even baroque. However, one should consider that it 
solves a problem that so far has not even been 
addressed— let alone solved— with any systematic ap- 
proach. Also, the method is presently likely to suffer 
from a lack of suitable data. But judging by the develop- 
ment of high-throughput experimental methods and the 
number and increasing quality of published time series 
over the past decade, this issue seems to be primarily a 
matter of time. Indeed, one should expect that it will 
soon be feasible to generate strategically selected, mul- 
tiple datasets for the identification of a system, which 
differ slightly in their settings. These datasets must come 
from experiments that do not alter the functional charac- 
teristics of the fluxes in the system but might, for instance, 
measure system responses under modestly different sub- 
strate or inhibitor conditions. At the same time, the data 
should be representative of the dynamics of the system 
within the pertinent ranges of its variables. 

The method involves one step that is subject to bias. 
Namely, the overall shifting of the flux-metabolite rela- 
tionship requires extrapolation or some other information, 
unless metabolite concentrations close to zero are avail- 
able. To resolve this issue, it might be possible to deter- 
mine a reference point for the shift from enzymatic or 
kinetic information. However, in many cases, this informa- 
tion will have been obtained in vitro and possibly under 
different conditions. A more direct approach would be to 
measure a flux value experimentally at some point, for in- 
stance, at the steady state. Such a measurement is rela- 
tively simple when the flux of interest is an input or 
output flux. It might also be possible to measure some 
fluxes directly by estimating the rate of consumption and 
production of the initial substrate or the end product, re- 
spectively. However, the measurements of fluxes at these 
locations are usually of lower interest since they are 



seldom associated with the underdetermined subsystems 
of the internal fluxes. One or more intracellular fluxes 
could also possibly be characterized through measure- 
ments of a suitable isotopomer distribution at steady state 
{i.e. [17,35]), but such data are still rare. Finally, if one has 
valid reason to assume a particular format for a flux, such 
as a Michaelis-Menten or Hill function, the shift may be 
obtained through optimization. All estimation methods 
are negatively affected by noise, and the proposed method 
is no different (see Additional file 1). However, issues of 
noise can be logistically separated from this method to 
some degree. Namely, the time series data used as basis 
for the proposed method may (in fact, should) be 
smoothed in a preprocessing step, for instance with a filter 
[2,26,27]. This well-established step allows an assessment 
of the characteristics of the noise as well as the smoothing 
process itself. Once the data are smoothed and noise is 
thus reduced, the proposed method is applied as if the 
data had been noise free. 

Outside these remaining details, the proposed method 
has several notable advantages. First, no assumptions are 
needed regarding the mathematical representations when 
determining the individual fluxes. Second, the application 
of the method is not limited to a small range of a metabol- 
ite or its flux. Instead, it allows the modeler to examine 
the full spectrum of the functional form, depending on 
how widely the available time series data cover metabolite 
concentrations along the x axis of the metabolite-flux plot. 
Third, even under the condition that some of the time 
series are missing, the proposed method can still re- 
cover—at least to some degree— the governing flux pro- 
files. Finally, since the range of coverage depends on the 
available datasets, we can, arguably for the first time, esti- 
mate how many data points are necessary to identify the 
functional format of a flux or what values of metabolite 
concentrations are needed to cover the concentration 
range of interest. Namely, if it is possible to implement the 
proposed method for a system at hand with sufficient reli- 
ability, then we know that we have enough data to assess 
the range over which the flux can be determined. If a 
wider range needs to be known, additional data will have 
to be made available in that extended range. This insight 
in itself will aid the design of specific experiments that can 
be used to generate more extensive functional plots. 

Conclusions 

In this article we propose a systematic strategy to supple- 
ment and ameliorate the limitations of the method of Dy- 
namic Flux Estimation (DFE). The proposed strategy 
makes no a priori assumptions regarding the model repre- 
sentation and uses instead information embedded in the 
time series data. The results demonstrate that the pro- 
posed approach successfully complements DFE in various 
situations. The method permits the examination of a full 
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spectrum of functional forms, as well as a determination 
of whether at all, to what degree, or within what range, the 
available time series data can be validly represented in a 
particular functional flux format within a pathway model 
Based on these results, one can, arguably for the first time, 
estimate how many data points are required to identify 
the functional format of processes within a system model 
and design experiments to generate data points that genu- 
inely add new information to the parameter estimation 
and structure identification tasks. 
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Additional file 1: This file contains: (1) details regarding the process 
of merging pairs of points; (2) the estimation procedure for a four- 
variable branched pathway and results of two cases where fluxes 
contain more than one variable; and (3) the results of the method 
for a five-variable system where different levels of artificial noise 
were added to the time series data and sub-datasets were 
randomly picked from data generated with ten sets of initial 
conditions. [24,31 33] 



